首页> 外文OA文献 >NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps
【2h】

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps

机译:NullHop:一种基于FpGa的灵活卷积神经网络加速器   特征映射的稀疏表示

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Convolutional neural networks (CNNs) have become the dominant neural networkarchitecture for solving many state-of-the-art (SOA) visual processing tasks.Even though Graphical Processing Units (GPUs) are most often used in trainingand deploying CNNs, their power consumption becomes a problem for real timemobile applications. We propose a flexible and efficient CNN acceleratorarchitecture which can support the implementation of SOA CNNs in low-power andlow-latency application scenarios. This architecture exploits the sparsity ofneuron activations in CNNs to accelerate the computation and reduce memoryrequirements. The flexible architecture allows high utilization of availablecomputing resources across a wide range of convolutional network kernel sizes;and numbers of input and output feature maps. We implemented the proposedarchitecture on an FPGA platform and present results showing how ourimplementation reduces external memory transfers and compute time in fivedifferent CNNs ranging from small ones up to the widely known large VGG16 andVGG19 CNNs. We show how in RTL simulations in a 28nm process with a clockfrequency of 500MHz, the NullHop core is able to reach over 450 GOp/s andefficiency of 368%, maintaining over 98% utilization of the MAC units andachieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2
机译:卷积神经网络(CNN)已成为解决许多最新(SOA)视觉处理任务的主要神经网络体系结构。即使图形处理单元(GPU)最常用于训练和部署CNN,其功耗也变得越来越大。实时移动应用程序存在的问题。我们提出了一种灵活高效的CNN加速器体系结构,该体系结构可支持在低功耗和低延迟应用场景中实现SOA CNN。此体系结构利用了CNN中稀疏的神经元激活来加速计算并减少内存需求。灵活的体系结构允许在各种卷积网络内核大小以及输入和输出特征图的数量中广泛利用可用的计算资源。我们在FPGA平台上实现了所提出的架构,并给出了结果,展示了我们的实现如何减少五个不同CNN(从小型CNN到众所周知的大型VGG16和VGG19 CNN)中的外部存储器传输和计算时间。我们展示了如何在时钟频率为500MHz的28nm工艺的RTL仿真中,NullHop内核能够达到450 GOp / s以上的效率和368%的效率,保持MAC单元的98%以上的利用率,并实现超过3TOp /的功率效率。 s / W在5.8mm2的核心区域

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号